我们提出了一种新颖的体系结构,用于使用离散的潜在变量对以任务为导向的对话进行解释建模,以表示对话动作。我们的模型基于变异复发性神经网络(VRNN),不需要明确的语义信息注释。与以前的作品不同,我们的方法模型系统和用户单独转动并执行数据库查询建模,这使该模型适用于以任务为导向的对话,同时生成易于解释的可解释的可解释的潜在变量。我们表明,我们的模型在三个数据集中的困惑和BLEU方面优于先前的方法,我们提出了一种衡量对话成功的方法,而无需专家注释。最后,我们提出了一种新颖的方式来解释有关系统动作的潜在变量语义。
translated by 谷歌翻译
我们介绍了AARGH,这是一个面向任务的对话框系统,该系统结合了单个模型中的检索和生成方法,旨在改善对话框管理和输出的词汇多样性。该模型采用了一种新的响应选择方法,该方法基于动作感知训练目标和简化的单编码检索架构,该方法使我们能够构建端到端检索增强生成模型,在该模型中,检索和生成共享大多数参数。在Multiwoz数据集上,我们表明我们的方法与最先进的基线相比,在维持或改善状态跟踪和上下文响应生成性能的同时,产生了更多的输出。
translated by 谷歌翻译
我们提出了一种新颖的方法来通过使用具有不同个性类型的代理来生成脚本。为了管理脚本中的字符交互,我们采用了模拟的戏剧网络。关于多个标准的自动和人类评估表明,我们的方法的表现优于基于香草-GPT2的基线。我们进一步引入了一个新的指标,以根据自然语言推论评估对话一致性并证明其有效性。
translated by 谷歌翻译
The increasingly widespread adoption of large language models has highlighted the need for improving their explainability. We present context length probing, a novel explanation technique for causal language models, based on tracking the predictions of a model as a function of the length of available context, and allowing to assign differential importance scores to different contexts. The technique is model-agnostic and does not rely on access to model internals beyond computing token-level probabilities. We apply context length probing to large pre-trained language models and offer some initial analyses and insights, including the potential for studying long-range dependencies. The source code and a demo of the method are available.
translated by 谷歌翻译
This paper presents a conversational AI platform called Flowstorm. Flowstorm is an open-source SaaS project suitable for creating, running, and analyzing conversational applications. Thanks to the fast and fully automated build process, the dialogues created within the platform can be executed in seconds. Furthermore, we propose a novel dialogue architecture that uses a combination of tree structures with generative models. The tree structures are also used for training NLU models suitable for specific dialogue scenarios. However, the generative models are globally used across applications and extend the functionality of the dialogue trees. Moreover, the platform functionality benefits from out-of-the-box components, such as the one responsible for extracting data from utterances or working with crawled data. Additionally, it can be extended using a custom code directly in the platform. One of the essential features of the platform is the possibility to reuse the created assets across applications. There is a library of prepared assets where each developer can contribute. All of the features are available through a user-friendly visual editor.
translated by 谷歌翻译
We present the CUNI-Bergamot submission for the WMT22 General translation task. We compete in English$\rightarrow$Czech direction. Our submission further explores block backtranslation techniques. Compared to the previous work, we measure performance in terms of COMET score and named entities translation accuracy. We evaluate performance of MBR decoding compared to traditional mixed backtranslation training and we show a possible synergy when using both of the techniques simultaneously. The results show that both approaches are effective means of improving translation quality and they yield even better results when combined.
translated by 谷歌翻译
While modern Text-to-Speech (TTS) systems can produce speech rated highly in terms of subjective evaluation, the distance between real and synthetic speech distributions remains understudied, where we use the term \textit{distribution} to mean the sample space of all possible real speech recordings from a given set of speakers; or of the synthetic samples that could be generated for the same set of speakers. We evaluate the distance of real and synthetic speech distributions along the dimensions of the acoustic environment, speaker characteristics and prosody using a range of speech processing measures and the respective Wasserstein distances of their distributions. We reduce these distribution distances along said dimensions by providing utterance-level information derived from the measures to the model and show they can be generated at inference time. The improvements to the dimensions translate to overall distribution distance reduction approximated using Automatic Speech Recognition (ASR) by evaluating the fitness of the synthetic data as training data.
translated by 谷歌翻译
本文介绍了我们对CRAC 2022关于多语言核心分辨率的共享任务的方法。我们的模型基于最新的端到端核心分辨率系统。除了加入多语言培训之外,我们还通过提及头部预测提高了结果。我们还试图将依赖性信息集成到我们的模型中。我们的系统最终以$ 3^{rd} $ place。此外,我们在13个数据集中达到了最佳性能。
translated by 谷歌翻译
我们介绍了第一个机器学习引力波搜索模拟数据挑战(MLGWSC-1)的结果。在这一挑战中,参与的小组必须从二进制黑洞合并中识别出复杂性和持续时间逐渐嵌入在逐渐更现实的噪声中的引力波信号。 4个提供的数据集中的决赛包含O3A观察的真实噪声,并发出了20秒的持续时间,其中包含进动效应和高阶模式。我们介绍了在提交前从参与者未知的1个月的测试数据中得出的6个输入算法的平均灵敏度距离和运行时。其中4个是机器学习算法。我们发现,最好的基于机器学习的算法能够以每月1个的错误警报率(FAR)的速度(FAR)实现基于匹配过滤的生产分析的敏感距离的95%。相反,对于真实的噪音,领先的机器学习搜索获得了70%。为了更高的范围,敏感距离缩小的差异缩小到某些数据集上选择机器学习提交的范围$ \ geq 200 $以优于传统搜索算法的程度。我们的结果表明,当前的机器学习搜索算法可能已经在有限的参数区域中对某些生产设置有用。为了改善最新的技术,机器学习算法需要降低他们能够检测信号并将其有效性扩展到参数空间区域的虚假警报率,在这些区域中,建模的搜索在计算上很昂贵。根据我们的发现,我们汇编了我们认为,将机器学习搜索提升到重力波信号检测中的宝贵工具,我们认为这是最重要的研究领域。
translated by 谷歌翻译
本文概述了与CRAC 2022研讨会相关的多语言核心分辨率的共享任务。共同的任务参与者应该开发能够识别提及并根据身份核心重点聚集的训练系统。Corefud 1.0的公共版本包含10种语言的13个数据集,被用作培训和评估数据的来源。先前面向核心共享任务中使用的串联分数用作主要评估度量。5个参与团队提交了8个核心预测系统;此外,组织者在共享任务开始时提供了一个基于竞争变压器的基线系统。获胜者系统的表现优于基线12个百分点(就所有语言的所有数据集而言,在所有数据集中平均得分)。
translated by 谷歌翻译